Improve SSR raymarching performance #99693

Flarkk · 2024-11-25T21:07:27Z

This PR brings a major rewrite of the Screen Space Reflection raymarching code, targetting performance optimization :

Implements a DDA algorithm that marches the ray simultaneously in ndc and homogeneous view space, as described in "Efficient GPU Screen-Space Ray Tracing" (Morgan McGuire and al.).
Produces a linear depth buffer during the scale pre-pass (this was actually already the case for the single-eye setup, but not for VR). In conjunction with homogeneous view space marching, this removes the need for any reprojection in the ray marching loop.
Removes normal-roughness buffer fetches during marching, utilized to perform backface culling. This is now performed by comparing the current and previous samples' depth and rejecting hits when the ray exits the volume.
Solves 2 issues :
- rays didn't pass behind objects resulting in long hollow trails (see captures below). These are now gone and in the worst case replaced by smaller holes corresponding to the object footprint on the surface behind
- SSR code doesn't depend anymore on any camera attributes reconstruction method (like Projection::get_z_far() or Projection::is_orthogonal()). These can break under certain circumstances, typically when the zfar / znear ratio is very large, the projection matrix becomes infinite and it's not possible to extract zfar anymore from it
Hopefully improves code readability and establishes a good foundation for further improvements. A few ideas I leave to further PRs :
- Ray striding and jittering, as described in the above paper. This allows marching farther with the same number of samples without deteriorating image quality too much
- Add a binary search pass to refine the hit points as suggested here
- Hierarchical-Z approaches involving pre-computed z-buffer mip-maps to speed up marching even further

Visual differences

Cube roughness is 0.2, floor roughness is 0.0.
Depth threshold is 0.1.
Raymarching 512 steps.

This is the single-eye case. Any help to test it in VR is welcome.
Also, any test with more complex scenes would be appreciated.

Before	With this PR

Performance improvements

These should be material for both single-eye and VR setups. Although I couldn't get it statistically measured (I couldn't sort out yet the render graph messing up debug markers, despite active support from @clayjohn @Ansraer and @DarioSamo), the GPU traces below show clues of a ~20% compute time reduction.

In this context, please take the below with a grain of salt as it is my interpretation (also please ignore the markers) :

might be the scale pass, left pretty much unchanged in single-eye setup
is likely the core ray marching logic, optimized by ~50% after this PR
looks like the filter pass, although it's pretty much left untouched by this PR and I don't get why it would be longer

Any help on making this analysis stronger is welcome.

Top chart : before
Bottom chart : with this PR

hsvfan-jan · 2024-11-25T21:32:34Z

The white highlight beneath the cube seems to be accentuated by the new algorithm. Is it possible to turn that down to a similar level as before where it wasn't as obvious? Perhaps the reflection is just mistakenly offset by a few pixels and that's what's causing the highlight look

Flarkk · 2024-11-26T08:52:41Z

Just fixed the white contact line.
Backface culling logic was rejecting all hits on the first marching iteration, hence the missed reflections close to contact areas.
This is not the case anymore now.

RPicster · 2024-11-26T10:10:14Z

I gave it a test and it doesn't improve perceived quality for me and the PR introduced new artifacts. The artifacts are mostly visible at the dark side of the cube.

Flarkk · 2024-11-26T10:29:06Z

@RPicster thanks for testing. Can you share the project files ?
The artifacts might be related to the Depth tolerance parameter in Environment being used slightly differently in the new implementation. Will see how I can make it work exactly as before to prevent users having to adjust it.

Also this PR is mostly focused on performance, so don't expect quality improvements in most cases.

RPicster · 2024-11-26T11:12:23Z

ssr-test.zip
I adjusted it a bit, I hope it helps

mrjustaguy · 2024-11-26T11:53:18Z

I think the issue you're seeing is actually from a quality improvement of this PR, resulting in more accurate reflections, while current SSR would tend to elongate things, filling those gaps you're seeing.

The effect can be seen in the before and after screen shots of the PR

Flarkk requested a review from a team as a code owner November 25, 2024 21:07

tetrapod00 added enhancement topic:rendering labels Nov 25, 2024

tetrapod00 added this to the 4.x milestone Nov 25, 2024

tetrapod00 added the topic:3d label Nov 25, 2024

Improve SSR raymarching performance and quality

8ce6f6f

Flarkk force-pushed the optimize_ssr branch from b4e8bfd to 8ce6f6f Compare November 26, 2024 08:46

Flarkk changed the title ~~Improve SSR raymarching performance and quality~~ Improve SSR raymarching performance Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SSR raymarching performance #99693

Improve SSR raymarching performance #99693

Flarkk commented Nov 25, 2024 •

edited

Loading

hsvfan-jan commented Nov 25, 2024

Flarkk commented Nov 26, 2024 •

edited

Loading

RPicster commented Nov 26, 2024 •

edited

Loading

Flarkk commented Nov 26, 2024

RPicster commented Nov 26, 2024

mrjustaguy commented Nov 26, 2024

Improve SSR raymarching performance #99693

Are you sure you want to change the base?

Improve SSR raymarching performance #99693

Conversation

Flarkk commented Nov 25, 2024 • edited Loading

Visual differences

Performance improvements

hsvfan-jan commented Nov 25, 2024

Flarkk commented Nov 26, 2024 • edited Loading

RPicster commented Nov 26, 2024 • edited Loading

Flarkk commented Nov 26, 2024

RPicster commented Nov 26, 2024

mrjustaguy commented Nov 26, 2024

Flarkk commented Nov 25, 2024 •

edited

Loading

Flarkk commented Nov 26, 2024 •

edited

Loading

RPicster commented Nov 26, 2024 •

edited

Loading